Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 6717 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 682.3 KiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 5 |
name has a high cardinality: 1982 distinct values | High cardinality |
year is highly correlated with selling_price and 1 other fields | High correlation |
selling_price is highly correlated with year and 1 other fields | High correlation |
km_driven is highly correlated with year | High correlation |
seats is highly correlated with engine CC | High correlation |
engine CC is highly correlated with seats and 1 other fields | High correlation |
power BHP is highly correlated with selling_price and 1 other fields | High correlation |
selling_price is highly correlated with power BHP | High correlation |
seats is highly correlated with engine CC | High correlation |
mileage KMPL is highly correlated with engine CC | High correlation |
engine CC is highly correlated with seats and 2 other fields | High correlation |
power BHP is highly correlated with selling_price and 1 other fields | High correlation |
year is highly correlated with selling_price | High correlation |
selling_price is highly correlated with year | High correlation |
engine CC is highly correlated with power BHP | High correlation |
power BHP is highly correlated with engine CC | High correlation |
year is highly correlated with owner | High correlation |
selling_price is highly correlated with owner and 1 other fields | High correlation |
fuel is highly correlated with engine CC | High correlation |
transmission is highly correlated with power BHP | High correlation |
owner is highly correlated with year and 1 other fields | High correlation |
seats is highly correlated with mileage KMPL and 1 other fields | High correlation |
mileage KMPL is highly correlated with seats and 1 other fields | High correlation |
engine CC is highly correlated with fuel and 3 other fields | High correlation |
power BHP is highly correlated with selling_price and 2 other fields | High correlation |
df_index has unique values | Unique |
Reproduction
| Analysis started | 2022-04-29 02:11:15.641669 |
|---|---|
| Analysis finished | 2022-04-29 02:11:24.449884 |
| Duration | 8.81 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 6717 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3957.142177 |
| Minimum | 0 |
|---|---|
| Maximum | 8125 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 350.8 |
| Q1 | 1915 |
| median | 3869 |
| Q3 | 5992 |
| 95-th percentile | 7669.2 |
| Maximum | 8125 |
| Range | 8125 |
| Interquartile range (IQR) | 4077 |
Descriptive statistics
| Standard deviation | 2361.800637 |
|---|---|
| Coefficient of variation (CV) | 0.596845029 |
| Kurtosis | -1.222934817 |
| Mean | 3957.142177 |
| Median Absolute Deviation (MAD) | 2038 |
| Skewness | 0.05171443885 |
| Sum | 26580124 |
| Variance | 5578102.25 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 5381 | 1 | < 0.1% |
| 5353 | 1 | < 0.1% |
| 5352 | 1 | < 0.1% |
| 5350 | 1 | < 0.1% |
| 5349 | 1 | < 0.1% |
| 5348 | 1 | < 0.1% |
| 5347 | 1 | < 0.1% |
| 5346 | 1 | < 0.1% |
| 5345 | 1 | < 0.1% |
| Other values (6707) | 6707 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 8125 | 1 | |
| 8124 | 1 | |
| 8123 | 1 | |
| 8122 | 1 | |
| 8121 | 1 | |
| 8120 | 1 | |
| 8119 | 1 | |
| 8118 | 1 | |
| 8116 | 1 | |
| 8115 | 1 |
| Distinct | 1982 |
|---|---|
| Distinct (%) | 29.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.6 KiB |
| Maruti Swift Dzire VDI | 118 |
|---|---|
| Maruti Alto 800 LXI | 76 |
| Maruti Alto LXi | 69 |
| Maruti Swift VDI | 60 |
| Maruti Alto K10 VXI | 47 |
| Other values (1977) |
Length
| Max length | 54 |
|---|---|
| Median length | 24 |
| Mean length | 25.19994045 |
| Min length | 11 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 898 ? |
|---|---|
| Unique (%) | 13.4% |
Sample
| 1st row | Maruti Swift Dzire VDI |
|---|---|
| 2nd row | Skoda Rapid 1.5 TDI Ambition |
| 3rd row | Honda City 2017-2020 EXi |
| 4th row | Hyundai i20 Sportz Diesel |
| 5th row | Maruti Swift VXI BSIII |
Common Values
| Value | Count | Frequency (%) |
| Maruti Swift Dzire VDI | 118 | 1.8% |
| Maruti Alto 800 LXI | 76 | 1.1% |
| Maruti Alto LXi | 69 | 1.0% |
| Maruti Swift VDI | 60 | 0.9% |
| Maruti Alto K10 VXI | 47 | 0.7% |
| Hyundai EON Era Plus | 44 | 0.7% |
| Maruti Wagon R VXI BS IV | 43 | 0.6% |
| Maruti Alto LX | 43 | 0.6% |
| Maruti Ertiga VDI | 42 | 0.6% |
| Maruti Ritz VDi | 40 | 0.6% |
| Other values (1972) | 6135 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| maruti | 2089 | 6.6% |
| hyundai | 1214 | 3.8% |
| mahindra | 709 | 2.2% |
| tata | 633 | 2.0% |
| swift | 620 | 2.0% |
| diesel | 545 | 1.7% |
| bsiv | 542 | 1.7% |
| 1.2 | 502 | 1.6% |
| vxi | 476 | 1.5% |
| plus | 475 | 1.5% |
| Other values (825) | 23905 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 27 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2013.611136 |
| Minimum | 1994 |
|---|---|
| Maximum | 2020 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.6 KiB |
Quantile statistics
| Minimum | 1994 |
|---|---|
| 5-th percentile | 2007 |
| Q1 | 2011 |
| median | 2014 |
| Q3 | 2017 |
| 95-th percentile | 2019 |
| Maximum | 2020 |
| Range | 26 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.897401569 |
|---|---|
| Coefficient of variation (CV) | 0.001935528414 |
| Kurtosis | 1.166627829 |
| Mean | 2013.611136 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.931471173 |
| Sum | 13525426 |
| Variance | 15.18973899 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=27)
| Value | Count | Frequency (%) |
| 2017 | 802 | |
| 2016 | 691 | |
| 2015 | 680 | |
| 2018 | 607 | |
| 2014 | 580 | |
| 2012 | 576 | |
| 2013 | 560 | |
| 2011 | 535 | |
| 2010 | 361 | 5.4% |
| 2019 | 347 | 5.2% |
| Other values (17) | 978 |
| Value | Count | Frequency (%) |
| 1994 | 2 | < 0.1% |
| 1995 | 1 | < 0.1% |
| 1996 | 2 | < 0.1% |
| 1997 | 9 | 0.1% |
| 1998 | 9 | 0.1% |
| 1999 | 13 | 0.2% |
| 2000 | 14 | 0.2% |
| 2001 | 6 | 0.1% |
| 2002 | 19 | |
| 2003 | 36 |
| Value | Count | Frequency (%) |
| 2020 | 63 | 0.9% |
| 2019 | 347 | |
| 2018 | 607 | |
| 2017 | 802 | |
| 2016 | 691 | |
| 2015 | 680 | |
| 2014 | 580 | |
| 2013 | 560 | |
| 2012 | 576 | |
| 2011 | 535 |
| Distinct | 670 |
|---|---|
| Distinct (%) | 10.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 526385.997 |
| Minimum | 29999 |
|---|---|
| Maximum | 10000000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.6 KiB |
Quantile statistics
| Minimum | 29999 |
|---|---|
| 5-th percentile | 110000 |
| Q1 | 250000 |
| median | 420000 |
| Q3 | 650000 |
| 95-th percentile | 1200000 |
| Maximum | 10000000 |
| Range | 9970001 |
| Interquartile range (IQR) | 400000 |
Descriptive statistics
| Standard deviation | 523550.4483 |
|---|---|
| Coefficient of variation (CV) | 0.994613176 |
| Kurtosis | 52.48996792 |
| Mean | 526385.997 |
| Median Absolute Deviation (MAD) | 185000 |
| Skewness | 5.57076348 |
| Sum | 3535734742 |
| Variance | 2.741050719 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 300000 | 208 | 3.1% |
| 350000 | 196 | 2.9% |
| 600000 | 167 | 2.5% |
| 400000 | 164 | 2.4% |
| 250000 | 161 | 2.4% |
| 550000 | 160 | 2.4% |
| 500000 | 160 | 2.4% |
| 450000 | 147 | 2.2% |
| 650000 | 145 | 2.2% |
| 200000 | 134 | 2.0% |
| Other values (660) | 5075 |
| Value | Count | Frequency (%) |
| 29999 | 1 | < 0.1% |
| 30000 | 1 | < 0.1% |
| 31000 | 1 | < 0.1% |
| 31504 | 1 | < 0.1% |
| 33351 | 1 | < 0.1% |
| 35000 | 3 | < 0.1% |
| 39000 | 1 | < 0.1% |
| 40000 | 11 | |
| 42000 | 2 | < 0.1% |
| 45000 | 21 |
| Value | Count | Frequency (%) |
| 10000000 | 1 | < 0.1% |
| 7200000 | 1 | < 0.1% |
| 6523000 | 1 | < 0.1% |
| 6223000 | 1 | < 0.1% |
| 6000000 | 3 | |
| 5923000 | 1 | < 0.1% |
| 5850000 | 1 | < 0.1% |
| 5830000 | 1 | < 0.1% |
| 5800000 | 2 | |
| 5500000 | 4 |
| Distinct | 898 |
|---|---|
| Distinct (%) | 13.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 73398.33765 |
| Minimum | 1 |
|---|---|
| Maximum | 2360457 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 11500 |
| Q1 | 38000 |
| median | 68203 |
| Q3 | 100000 |
| 95-th percentile | 155000 |
| Maximum | 2360457 |
| Range | 2360456 |
| Interquartile range (IQR) | 62000 |
Descriptive statistics
| Standard deviation | 58703.27527 |
|---|---|
| Coefficient of variation (CV) | 0.7997902561 |
| Kurtosis | 397.3335813 |
| Mean | 73398.33765 |
| Median Absolute Deviation (MAD) | 31797 |
| Skewness | 11.91618609 |
| Sum | 493016634 |
| Variance | 3446074527 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 120000 | 487 | 7.3% |
| 70000 | 420 | 6.3% |
| 80000 | 405 | 6.0% |
| 60000 | 383 | 5.7% |
| 50000 | 346 | 5.2% |
| 100000 | 305 | 4.5% |
| 90000 | 301 | 4.5% |
| 40000 | 280 | 4.2% |
| 110000 | 256 | 3.8% |
| 30000 | 213 | 3.2% |
| Other values (888) | 3321 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 1000 | 5 | |
| 1300 | 1 | < 0.1% |
| 1303 | 1 | < 0.1% |
| 1500 | 2 | < 0.1% |
| 1600 | 1 | < 0.1% |
| 1620 | 1 | < 0.1% |
| 2000 | 7 | |
| 2118 | 1 | < 0.1% |
| 2136 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 2360457 | 1 | |
| 1500000 | 1 | |
| 577414 | 1 | |
| 500000 | 2 | |
| 475000 | 1 | |
| 440000 | 1 | |
| 426000 | 1 | |
| 380000 | 1 | |
| 376412 | 1 | |
| 375000 | 1 |
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.6 KiB |
| Diesel | |
|---|---|
| Petrol | |
| CNG | 51 |
| LPG | 35 |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 5.961589996 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Diesel |
|---|---|
| 2nd row | Diesel |
| 3rd row | Petrol |
| 4th row | Diesel |
| 5th row | Petrol |
Common Values
| Value | Count | Frequency (%) |
| Diesel | 3658 | |
| Petrol | 2973 | |
| CNG | 51 | 0.8% |
| LPG | 35 | 0.5% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| diesel | 3658 | |
| petrol | 2973 | |
| cng | 51 | 0.8% |
| lpg | 35 | 0.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
seller_type
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.6 KiB |
| Individual | |
|---|---|
| Dealer | |
| Trustmark Dealer | 27 |
Length
| Max length | 16 |
|---|---|
| Median length | 10 |
| Mean length | 9.627512282 |
| Min length | 6 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Individual |
|---|---|
| 2nd row | Individual |
| 3rd row | Individual |
| 4th row | Individual |
| 5th row | Individual |
Common Values
| Value | Count | Frequency (%) |
| Individual | 6024 | |
| Dealer | 666 | 9.9% |
| Trustmark Dealer | 27 | 0.4% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| individual | 6024 | |
| dealer | 693 | 10.3% |
| trustmark | 27 | 0.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.6 KiB |
| Manual | |
|---|---|
| Automatic | 575 |
Length
| Max length | 9 |
|---|---|
| Median length | 6 |
| Mean length | 6.256811076 |
| Min length | 6 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Manual |
|---|---|
| 2nd row | Manual |
| 3rd row | Manual |
| 4th row | Manual |
| 5th row | Manual |
Common Values
| Value | Count | Frequency (%) |
| Manual | 6142 | |
| Automatic | 575 | 8.6% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| manual | 6142 | |
| automatic | 575 | 8.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.6 KiB |
| First Owner | |
|---|---|
| Second Owner | |
| Third Owner | |
| Fourth & Above Owner | 155 |
| Test Drive Car | 5 |
Length
| Max length | 20 |
|---|---|
| Median length | 11 |
| Mean length | 11.490993 |
| Min length | 11 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | First Owner |
|---|---|
| 2nd row | Second Owner |
| 3rd row | Third Owner |
| 4th row | First Owner |
| 5th row | First Owner |
Common Values
| Value | Count | Frequency (%) |
| First Owner | 4176 | |
| Second Owner | 1888 | |
| Third Owner | 493 | 7.3% |
| Fourth & Above Owner | 155 | 2.3% |
| Test Drive Car | 5 | 0.1% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| owner | 6712 | |
| first | 4176 | |
| second | 1888 | 13.7% |
| third | 493 | 3.6% |
| fourth | 155 | 1.1% |
| 155 | 1.1% | |
| above | 155 | 1.1% |
| test | 5 | < 0.1% |
| drive | 5 | < 0.1% |
| car | 5 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 9 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.434271252 |
| Minimum | 2 |
|---|---|
| Maximum | 14 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.6 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 5 |
| median | 5 |
| Q3 | 5 |
| 95-th percentile | 7 |
| Maximum | 14 |
| Range | 12 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.9838050408 |
|---|---|
| Coefficient of variation (CV) | 0.1810371612 |
| Kurtosis | 3.608223114 |
| Mean | 5.434271252 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.919315131 |
| Sum | 36502 |
| Variance | 0.9678723582 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=9)
| Value | Count | Frequency (%) |
| 5 | 5254 | |
| 7 | 966 | 14.4% |
| 8 | 221 | 3.3% |
| 4 | 124 | 1.8% |
| 9 | 74 | 1.1% |
| 6 | 57 | 0.8% |
| 10 | 18 | 0.3% |
| 2 | 2 | < 0.1% |
| 14 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 2 | 2 | < 0.1% |
| 4 | 124 | 1.8% |
| 5 | 5254 | |
| 6 | 57 | 0.8% |
| 7 | 966 | 14.4% |
| 8 | 221 | 3.3% |
| 9 | 74 | 1.1% |
| 10 | 18 | 0.3% |
| 14 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 14 | 1 | < 0.1% |
| 10 | 18 | 0.3% |
| 9 | 74 | 1.1% |
| 8 | 221 | 3.3% |
| 7 | 966 | 14.4% |
| 6 | 57 | 0.8% |
| 5 | 5254 | |
| 4 | 124 | 1.8% |
| 2 | 2 | < 0.1% |
| Distinct | 376 |
|---|---|
| Distinct (%) | 5.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19.49667832 |
| Minimum | 9 |
|---|---|
| Maximum | 30.46 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.6 KiB |
Quantile statistics
| Minimum | 9 |
|---|---|
| 5-th percentile | 12.9 |
| Q1 | 16.8 |
| median | 19.46658478 |
| Q3 | 22.5 |
| 95-th percentile | 25.83 |
| Maximum | 30.46 |
| Range | 21.46 |
| Interquartile range (IQR) | 5.7 |
Descriptive statistics
| Standard deviation | 3.915238225 |
|---|---|
| Coefficient of variation (CV) | 0.2008156549 |
| Kurtosis | -0.510254775 |
| Mean | 19.49667832 |
| Median Absolute Deviation (MAD) | 2.853415215 |
| Skewness | 0.007906554036 |
| Sum | 130959.1883 |
| Variance | 15.32909036 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 18.9 | 210 | 3.1% |
| 19.7 | 168 | 2.5% |
| 18.6 | 150 | 2.2% |
| 21.1 | 147 | 2.2% |
| 17 | 124 | 1.8% |
| 15.96 | 108 | 1.6% |
| 16.1 | 106 | 1.6% |
| 17.8 | 96 | 1.4% |
| 12.8 | 88 | 1.3% |
| 15.1 | 86 | 1.3% |
| Other values (366) | 5434 |
| Value | Count | Frequency (%) |
| 9 | 4 | 0.1% |
| 9.5 | 1 | < 0.1% |
| 10 | 2 | < 0.1% |
| 10.1 | 2 | < 0.1% |
| 10.5 | 17 | |
| 10.71 | 1 | < 0.1% |
| 10.75 | 2 | < 0.1% |
| 10.8 | 1 | < 0.1% |
| 10.9 | 3 | < 0.1% |
| 10.91 | 4 | 0.1% |
| Value | Count | Frequency (%) |
| 30.46 | 2 | < 0.1% |
| 28.4 | 85 | |
| 28.09 | 31 | 0.5% |
| 27.62 | 6 | 0.1% |
| 27.4 | 4 | 0.1% |
| 27.39 | 24 | 0.4% |
| 27.3 | 10 | 0.1% |
| 27.28 | 13 | 0.2% |
| 26.83 | 2 | < 0.1% |
| 26.8 | 3 | < 0.1% |
| Distinct | 121 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1433.989377 |
| Minimum | 793 |
|---|---|
| Maximum | 3604 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.6 KiB |
Quantile statistics
| Minimum | 793 |
|---|---|
| 5-th percentile | 796 |
| Q1 | 1197 |
| median | 1248 |
| Q3 | 1498 |
| 95-th percentile | 2499 |
| Maximum | 3604 |
| Range | 2811 |
| Interquartile range (IQR) | 301 |
Descriptive statistics
| Standard deviation | 490.9976244 |
|---|---|
| Coefficient of variation (CV) | 0.342399764 |
| Kurtosis | 0.9926247184 |
| Mean | 1433.989377 |
| Median Absolute Deviation (MAD) | 245 |
| Skewness | 1.232357125 |
| Sum | 9632106.646 |
| Variance | 241078.6671 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1248 | 907 | 13.5% |
| 1197 | 698 | 10.4% |
| 796 | 420 | 6.3% |
| 998 | 398 | 5.9% |
| 2179 | 339 | 5.0% |
| 1498 | 336 | 5.0% |
| 1396 | 281 | 4.2% |
| 1199 | 192 | 2.9% |
| 2523 | 183 | 2.7% |
| 1461 | 169 | 2.5% |
| Other values (111) | 2794 |
| Value | Count | Frequency (%) |
| 793 | 6 | 0.1% |
| 796 | 420 | |
| 799 | 71 | 1.1% |
| 814 | 111 | 1.7% |
| 909 | 2 | < 0.1% |
| 936 | 34 | 0.5% |
| 993 | 26 | 0.4% |
| 995 | 42 | 0.6% |
| 998 | 398 | |
| 999 | 69 | 1.0% |
| Value | Count | Frequency (%) |
| 3604 | 1 | < 0.1% |
| 3498 | 1 | < 0.1% |
| 3198 | 4 | 0.1% |
| 2999 | 2 | < 0.1% |
| 2997 | 2 | < 0.1% |
| 2993 | 13 | |
| 2987 | 8 | 0.1% |
| 2982 | 27 | |
| 2967 | 8 | 0.1% |
| 2956 | 19 |
| Distinct | 318 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 87.76610019 |
| Minimum | 32.8 |
|---|---|
| Maximum | 400 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.6 KiB |
Quantile statistics
| Minimum | 32.8 |
|---|---|
| 5-th percentile | 47.3 |
| Q1 | 67.1 |
| median | 81.83 |
| Q3 | 100 |
| 95-th percentile | 147.9 |
| Maximum | 400 |
| Range | 367.2 |
| Interquartile range (IQR) | 32.9 |
Descriptive statistics
| Standard deviation | 31.72455521 |
|---|---|
| Coefficient of variation (CV) | 0.3614670714 |
| Kurtosis | 5.426867236 |
| Mean | 87.76610019 |
| Median Absolute Deviation (MAD) | 14.79 |
| Skewness | 1.711327357 |
| Sum | 589524.895 |
| Variance | 1006.447403 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 74 | 324 | 4.8% |
| 88.5 | 193 | 2.9% |
| 46.3 | 158 | 2.4% |
| 67 | 152 | 2.3% |
| 67.1 | 141 | 2.1% |
| 81.8 | 136 | 2.0% |
| 67.04 | 136 | 2.0% |
| 70 | 134 | 2.0% |
| 47.3 | 131 | 2.0% |
| 62.1 | 130 | 1.9% |
| Other values (308) | 5082 |
| Value | Count | Frequency (%) |
| 32.8 | 2 | < 0.1% |
| 34.2 | 20 | 0.3% |
| 35 | 19 | 0.3% |
| 35.5 | 2 | < 0.1% |
| 37 | 88 | |
| 37.48 | 11 | 0.2% |
| 37.5 | 6 | 0.1% |
| 38 | 2 | < 0.1% |
| 38.4 | 2 | < 0.1% |
| 40.3 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 400 | 1 | < 0.1% |
| 282 | 1 | < 0.1% |
| 280 | 1 | < 0.1% |
| 272 | 1 | < 0.1% |
| 270.9 | 3 | |
| 265 | 1 | < 0.1% |
| 261.4 | 4 | |
| 258 | 2 | |
| 254.8 | 3 | |
| 254.79 | 1 | < 0.1% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | name | year | selling_price | km_driven | fuel | seller_type | transmission | owner | seats | mileage KMPL | engine CC | power BHP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Maruti Swift Dzire VDI | 2014 | 450000 | 145500 | Diesel | Individual | Manual | First Owner | 5.00 | 23.40 | 1248.00 | 74.00 |
| 1 | 1 | Skoda Rapid 1.5 TDI Ambition | 2014 | 370000 | 120000 | Diesel | Individual | Manual | Second Owner | 5.00 | 21.14 | 1498.00 | 103.52 |
| 2 | 2 | Honda City 2017-2020 EXi | 2006 | 158000 | 140000 | Petrol | Individual | Manual | Third Owner | 5.00 | 17.70 | 1497.00 | 78.00 |
| 3 | 3 | Hyundai i20 Sportz Diesel | 2010 | 225000 | 127000 | Diesel | Individual | Manual | First Owner | 5.00 | 23.00 | 1396.00 | 90.00 |
| 4 | 4 | Maruti Swift VXI BSIII | 2007 | 130000 | 120000 | Petrol | Individual | Manual | First Owner | 5.00 | 16.10 | 1298.00 | 88.20 |
| 5 | 5 | Hyundai Xcent 1.2 VTVT E Plus | 2017 | 440000 | 45000 | Petrol | Individual | Manual | First Owner | 5.00 | 20.14 | 1197.00 | 81.86 |
| 6 | 6 | Maruti Wagon R LXI DUO BSIII | 2007 | 96000 | 175000 | LPG | Individual | Manual | First Owner | 5.00 | 17.30 | 1061.00 | 57.50 |
| 7 | 7 | Maruti 800 DX BSII | 2001 | 45000 | 5000 | Petrol | Individual | Manual | Second Owner | 4.00 | 16.10 | 796.00 | 37.00 |
| 8 | 8 | Toyota Etios VXD | 2011 | 350000 | 90000 | Diesel | Individual | Manual | First Owner | 5.00 | 23.59 | 1364.00 | 67.10 |
| 9 | 9 | Ford Figo Diesel Celebration Edition | 2013 | 200000 | 169000 | Diesel | Individual | Manual | First Owner | 5.00 | 20.00 | 1399.00 | 68.10 |
Last rows
| df_index | name | year | selling_price | km_driven | fuel | seller_type | transmission | owner | seats | mileage KMPL | engine CC | power BHP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6707 | 8115 | Maruti 800 AC | 1997 | 40000 | 120000 | Petrol | Individual | Manual | First Owner | 4.00 | 16.10 | 796.00 | 37.00 |
| 6708 | 8116 | Maruti Alto K10 VXI Airbag | 2017 | 340000 | 45000 | Petrol | Individual | Manual | First Owner | 5.00 | 23.95 | 998.00 | 67.10 |
| 6709 | 8118 | Hyundai i20 Magna | 2013 | 380000 | 25000 | Petrol | Individual | Manual | First Owner | 5.00 | 18.50 | 1197.00 | 82.85 |
| 6710 | 8119 | Maruti Wagon R LXI Optional | 2017 | 360000 | 80000 | Petrol | Individual | Manual | First Owner | 5.00 | 20.51 | 998.00 | 67.04 |
| 6711 | 8120 | Hyundai Santro Xing GLS | 2008 | 120000 | 191000 | Petrol | Individual | Manual | First Owner | 5.00 | 17.92 | 1086.00 | 62.10 |
| 6712 | 8121 | Maruti Wagon R VXI BS IV with ABS | 2013 | 260000 | 50000 | Petrol | Individual | Manual | Second Owner | 5.00 | 18.90 | 998.00 | 67.10 |
| 6713 | 8122 | Hyundai i20 Magna 1.4 CRDi | 2014 | 475000 | 80000 | Diesel | Individual | Manual | Second Owner | 5.00 | 22.54 | 1396.00 | 88.73 |
| 6714 | 8123 | Hyundai i20 Magna | 2013 | 320000 | 110000 | Petrol | Individual | Manual | First Owner | 5.00 | 18.50 | 1197.00 | 82.85 |
| 6715 | 8124 | Hyundai Verna CRDi SX | 2007 | 135000 | 119000 | Diesel | Individual | Manual | Fourth & Above Owner | 5.00 | 16.80 | 1493.00 | 110.00 |
| 6716 | 8125 | Maruti Swift Dzire ZDi | 2009 | 382000 | 120000 | Diesel | Individual | Manual | First Owner | 5.00 | 19.30 | 1248.00 | 73.90 |